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Abstract 

The power of networks manifests itself in a highly non-linear 
amplification of a number of effects, and their weakness — 
in propagation of cascading failures. The potential sys- 
temic risk effects can be either exacerbated or mitigated, 
depending on the resilience characteristics of the network. 
The goals of this paper are to study some characteristics of 
network amplification and resilience. We simulate random 
Erdos-Renyi networks and measure amplification by vary- 
ing node capacity, transaction volume, and expected failure 
rates. We discover that network throughput scales almost 
quadratically with respect to the node capacity and that the 
effects of excessive network load and random and irrepara- 
ble node faults are equivalent and almost perfectly anticor- 
related. This knowledge can be used by capacity planners 
to determine optimal reliability requirements that maximize 
the optimal operational regions. 

1 INTRODUCTION 

The power of networks manifests itself in a highly non-linear 
amplification of a number of effects. Probably the most no- 
table example is the spread of epidemics, whereby a disease 
can spread from one person to many through a human dy- 
namics network 14 . The evolution of life on Earth shows, 
how species diversity exploded once cells started interact- 
ing with each other and sex was invented. This formed a 
complex network that amplified the effects of natural selec- 
tion [B] . Without this amplification, natural selection would 
have progressed linearly at best, taking much more than 
four billion years to create todays natural world. In Fi- 
nance, the 2007 crisis was an exemplification on how default 
on a small number of mortgages brought down the likes of 
Lehman Brothers. At the microbiology level, RNA networks 
have been shown to generate complex functions that amplify 
metabolism, in fact creating real chemical factories [2]. Fi- 
nally, the biggest enigma is how networks of neurons amplify 
sensory inputs to create cognition [13] . 

The flip side of amplification is network resilience (or 
lack of it). Systemic risk assessment in the financial sup- 
ply chain provides a situation where the ability to evaluate 
the potential negative impact of blockages in the global fi- 
nancial supply chain may prove crucial to the well being 
of the banking sector and the macroeconomy. As pointed 



out in [TT|, atomistic risk management in financial systems 
is unsatisfactory as it fails to give a salient risk assessment 
both at the nodes and for the entire system. Indeed, the 
effect of an idiosyncratic event at one node may propagate 
to other nodes, thereby resulting in financial instability or 
catastrophic failure of other nodes or of the entire network. 
Propagation of cascading failure can happen in many types 
of interdependent systems [4|. The potential systemic risk 
effects can be either exacerbated or mitigated, depending on 
the resilience characteristics of the network. 

A vast body of research has been focused on the impact 
of the structure of the networks and nodes. Phase transi- 
tions in networks [121 116] , whereby giant components form 
when density reaches a critical point, are frequently used to 
study diffusion in networks [15]. Percolation analysis has 
been used to measure the structural importance of particu- 
lar nodes, in terrorist networks [9| and soccer games [7J. A 
number of studies have shown how the structure of a net- 
work impacts fault tolerance, showing for instance that scale 
free networks are particularly vulnerable to targeted attacks 
and immune to random attacks, whereby random networks 
are the opposite [TJ [TO] . 

The goals of this paper are to study some characteristics 
of network amplification and resilience. For this study, we 
chose a stylized version of a network where nodes process 
generic transactions requiring certain capacity and process- 
ing time. We particularly focus on the relationship between 
the resilience of a network and the resilience of a typical 
node. As these services become part of a larger complex 
network that powers the firm's operations, the overall reli- 
ability is less than the reliability of each individual compo- 
nent. There is an amplification of fault. 

In this study we minimize the effects of structure by using 
random networks [S] with homogeneous nodes. We measure 
amplification by varying node capacity, transaction volume, 
and expected failure rates. 

2 NETWORK CONFIGURATION 

In this paper, we simulate and discuss a random Erdos- 
Renyi network [S] that consists of N = 1, 600 identical nodes 
representing network hosts. 

Each network node represents a server that can simulta- 
neously execute up to C independent subtransactions (the 
nature of the subtransactions is not essential for this study). 
Each subtransaction takes time to to complete (the time 
does not depend on the total load on the node). The net- 
work is simulated for the duration of Stq. 



The density of the networks is d, that is, of all possi- 
ble N (N — 1) directional connections, only dN (N — 1) are 
realized. The network has no loop-back connections. 

In addition to being able to execute transactions, each 
node can be also used for injecting transactions into the 
network (serve as a transaction source) and for terminating 
transactions, either by way of committing or aborting (serve 
as a transaction sink). During the simulation, transactions 
are injected uniformly across the network. The delays be- 
tween subsequent transactions are drawn from the exponen- 
tial distribution E (1/r), where r is the mean injection rate. 

All transactions injected in the network are distributed. 
A master transaction T consists of L subtransactions Ti 
(i 6 {1 ... L}; in our study, L is drawn from the discrete 
normal distribution N (10, 4), adjusted to exclude negative 
values of L). A master transaction is committed if all its sub- 
transactions are committed. Otherwise, the master trans- 
action is aborted. The transaction manager is implied and 
not simulated. 

Transactions are routed using an opportunistic routing 
strategy: the node for the next subtransaction is chosen 
uniformly at random from all neighbors of the current node. 
If the next node is disabled, then another neighbor is chosen. 
It is possible for the next subtransaction Ti+i to be executed 
by the same node as the previous subtransaction Ti_i. If all 
neighbors are disabled, the subtransaction is aborted, and 
the master transaction rolls back. 

Once injected in the network, a transaction has the time- 
to-live of 60to. Since it takes the constant time of lro for a 
transaction to clear a node, the fraction of transactions that 
are subject to aborting due to the timeout is ~ 1.2 x 10 -35 , 
and this behavior may be ignored (based on the normal dis- 
tribution of L). 

We assume that distributed transactions in our network 
are not independent (they do not have the ACID property, 
which is not uncommon for distributed transactions due to 
the Brewer's theorem [3 ). In our model, if a transaction is 
aborted for any reason, all other transactions that crossed 
path with it in the past T time units (T = 10ro), are also 
aborted with probability po = .01. 

The network nodes can become disabled in two ways. 
First, when a node is overloaded (the actual load at a node 
reaches or exceeds its capacity C), it shuts down. In real life, 
an overload-related shutdown may be caused by overheating, 
network congestion, excessive swapping or other resource 
constraints. 

Second, the network nodes may fail randomly after an 
initial delay drawn from the exponential distribution E(Tf). 
These random failures simulate the effect of the internal un- 
reliability. Shorter time to failures correspond to less reliable 
nodes. 

Initially, all nodes in a network are alive and can per- 
form their tasks. Once disabled, however, a node is not 
restarted and remains disabled for the rest of the simulation 
run — recovery may not be feasible or even possible in au- 
tonomous unmanaged networks (say, sensor networks [5]). 
All subtransactions currently executed at a disabled node, 
and the corresponding master transactions, are aborted. 

The network simulator has been implemented in C++ 
using a discrete event simulation package developed at the 
Mathematics and Computer Science Department of Suffolk 
University. 
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Figure 1: Experimental scenarios; the dashed and the dotted 
lines are hypothetical phase boundaries 

3 SIMULATION 

To study the effect of node failures on the network resilience 
and to propose and evaluate resilience measures, we con- 
ducted several numerical experiments, some of which are 
schematically presented in Figure [T] 

In each experiment, the network has been simulated for 
a variety of combinations of node capacities and average 
edge densities (C,d): d G {0.01, 0.011, 0.015, 0.025, 0.04, 
0.055, 0.075, 0.1, 0.2, 0.3, 0.5, 0.6, 0.75, 0.85, 0.99} and 
C e {2, 3, 4 . . . 22} (up to C = 23 for select density values). 

3.1 Failing by Overloading 

In the first experiment, we started with a fully functional 
network with no injected transactions. Then we gradually 
increased the injection rate from to ro (arrow A in Fig- 
ure [T]) until at least 10 -6 of all transactions would abort. 
Since this mode of operation is essentially lossless, we call 
it superconductive, ro is defined as the maximum abort-free 
rate. 

By injecting more than ro transactions per time unit, we 
partially overload the network and switch it into a resistive 
mode. The fraction of aborted transactions monotonically 
increases with the transaction injection rate r, until at some 
point the network chokes (all network nodes become over- 
loaded and shut down) before the end of a simulation run 
(arrow B in Figure [TJ. We denote this maximum choke- free 
injection rate as ri, and we call this operation mode dielec- 
tric. In the same spirit, we call ro and ri phase transition 
injection rates. 

Both ro and r\ depend on the simulation running time 
(shorter runs allow the network to terminate choke-free for 
higher injection rates). However, the difference between 
shorter runs of SVo and longer runs of 2Stq is within 5%. All 
further results have been obtained for S = 84, 600to ( "one 
day"). 

Since r\ is the highest meaningful injection rate, we will 
sometimes normalize injection rates by introducing po = 
ro/ri and p — r/ri. We have < {po, p} < 1. 

For each network configuration, we measured ro and n. 
Figure [2] shows both the experimental points and the best 
fit approximations that will be discussed in section [4] 




Figure 2: Phase transition injection rates ro (below the 
dashed line) and ri (above the dashed line), in transactions 
per to, vs node capacity C, for various network densities d; 
solid lines represent best fit approximations 



Figure 4: Phase diagram for C = 4 and d — 0.2. Plus "+" 
and minus "— " signs indicate resistive and dielectric points, 
respectively. The solid line is the best fit phase boundary 
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Figure 3: Phase transition node fault rate mo vs node capac- 
ity C, for various network densities d; solid lines represent 
best fit approximations 



Indeed, we learned from the previous two experiments 
that the phase transition between the resistive and dielectric 
states takes place in the points Zl (m = 0, p = 1) and Z2 
(m = mo, p — pa) in Figure [T] These points correspond to 
the first and second experiments. Point Z3 (m = 1, p = 0) 
is also on the phase boundary (it takes all nodes to be faulty 
to fail a network in the presence of zero traffic) . 

To locate the rest of the phase boundary (tentatively 
shown as a dashed line in Figure [TJ, we executed a number 
of simulation runs that moved the network from the origi- 
nal state O to various boundary states Z', Z", etc. (arrows 
D' and D") by simultaneously varying the injection rate 
and the proportion of the internally faulty nodes. The re- 
sult of this experiment was a phase transition diagram for 
the network for each tested configuration (C,d). The dia- 
grams show the boundary between the resistive and dielec- 
tric states (we did not instrument the simulator to detect the 
boundary between the resistive and superconductive states, 
though random experiments suggest that it probably follows 
the dotted line in Figure [1]). An example of a phase diagram 
for C — 4 and d = 0.2 is shown in Figure [4] 



3.2 Failing by Internal Faults 

In the second experiment, just like in the first one, we started 
with a fully functional network with no injected transac- 
tions, and gradually increased the injection rate to ro (the 
network is still in the superconductive state). Then, at the 
fixed injection rate, we started failing random nodes after 
random delays, simulating unrecoverable internal faults (ar- 
row C in Figure [1]). 

At the end of each simulation run, we measured the frac- 
tion of failed nodes m and the state of the network (either 
resistive or dielectric). Let mo be the smallest m that causes 
the network to choke and switch to the dielectric state. We 
call it phase transition node fault rate. Figure [3] shows both 
the experimental values of mo and the best fit approxima- 
tions. 

3.3 Failing by Overloading and Internal Faults 

Finally, in the third experiment we combined the two mech- 
anisms that cause network failures. 



3.4 Dependent Transactions 

We explored the relationship between the transaction inter- 
dependency probability po and the simulated values of ro, ri , 
and mo. Of them, only ri is statistically correlated withpo: 
changing po from to 1 increases r\ by approx4%. Indeed, 
in the presense of strong correlation between transactions, 
an aborted transaction always causes a cascaded rollback, 
that, in turn, releaves network conjestion and allows higher 
injection rate — at the cost of lower commit rate. 

4 DISCUSSION 

4.1 Dense and Sparse Networks 

We observed that in all simulated scenarios, the network 
behavior is determined, in the first place, by the network 
density d. The borderline between different behaviors is 
fuzzy and lies in the range do = [0.01 . . . 0.02]. In the dense 
networks (d > do), most performance characteristics do not 
depend on d, while in the sparse networks (d < do), the 



dependence on d is strong to the extent that many network 
measures diverge as d tends to 0. 

4.2 Amplification 

One goal of the study was to find the correlation between 
node capacity C (which corresponds to the material invest- 
ment into the networking infrastructure) and the aggregate 
network throughput expressed either in terms of ro or n. 
The relationship between C and ro and ri for various net- 
work densities d is shown in Figure [2] 

For both dense and sparse networks, both ro (C) and 
ri (C) can be approximated using a power function: 

r, (C) ^A,(C~ 2) ft . (1) 

The exponents /3; for the dense networks are « 1.7 and « 
2.1, respectively. Both /3;'s tend to 1 as d tends to 0. The 
mantissas Ai for the dense networks are ~ 0.7 and « 2.8, 
respectively. Both Ai increase and possibly diverge as d 
tends to 0. 

We observed the quadratic amplification effect: doubling 
node capacity almost quadruples the throughput. 

4.3 Effect of Faulty Nodes 

We could not easily find an explainable closed form approx- 
imation of mo (C). Eq. [2] seems to be in good agreement 
with the experimental results (Figure [3}- 

(A - 1) erf f loglo(c ~ 2) - p) + {A + 1) 
m (C) « ^ °— L . (2) 

The purpose of Eq. [2] is chiefly to estimate the dependen- 
cies between mo and C, not to predict them. In particular, 
we are not sure at this point if, as C tends to infinity, all 
(mo)s tend to 0, to a common positive asymptote or to in- 
dividual positive asymptotes. Exploring this issue would 
require more computational resources that we can presently 
afford. 

Based on the data that we have, we conclude that for the 
dense networks, the best fit curves described by Eq. [2] con- 
verge to a value of A in the range [0..0.23]. In other words, in 
the best case it would take 30% of internally faulty nodes to 
fail a dense network with infinite buffer space in the presence 
of the highest superconductive injection rate. In the worst 
case, the network may fail even with negligibly few faulty 
nodes. In general, as the node capacity increases, the pro- 
portion of nodes that must be disabled to choke the network 
at the insertion rate ro, decreases. This is not surprising, 
since ro itself scales up with C, so we expose the network to 
higher volumes of traffic. 

4.4 Equivalence of Excessive Traffic and Faulty 
Nodes 

Figure [5] shows the node fault rate mo vs maximum super- 
conductive injection rate po, for various network densities d 
(different symbols) and capacities C. The two measures are 
closely correlated. 

The solid lines represent best fit approximations (Eq.[3]). 

m (po) ~ 1 - Apg. (3) 

The less dense networks correspond to the lines with the 
more horizontal initial segment at po = 0. 
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Figure 5: Node fault rate mo vs network load po, for various 
network densities d and capacities C; solid lines represent 
best fit approximations; the dashed line is the diagonal of 
the lxl rectangle 

For dense networks, the mantissa A of Eq. [3]tends to 1, 
and the exponent ft tends to 1.15. For sparse networks, both 
parameters grow and possibly diverge as d tends to 0. 

To a first approximation, the relationship between the 
network resilience parameters po and mo is almost linear, 
with the slope of — 1. This means that tolerating additional 
superconductive traffic Apo (that is, narrowing the gap be- 
tween superconducting-resistive and resistive-dielectric in- 
jection rates) is equivalent to disabling extra network nodes 
Amo due to internal faults, and the other way around: 

Apo ~ -Am . (4) 

Each line in Figure [S] corresponds to a particular network 
density d, and the experimental points along the line corre- 
spond to various node capacities C. The points with higher 
values of C have higher po and lower mo. If the statement 
about the asymptotic behavior of mo with respect to C (that 
we made in subsection I4.3[) is true, than there is a conver- 
gence point ~ (0.75, 0.3) on the chart. No network would 
be able to sustain higher relative superconductive injection 
rate or choke with fewer faulty nodes. 

4.5 Combined External and Internal Effects 

In the first two experiments, we either exposed a healthy 
network to excessive traffic or faulted random nodes carrying 
the highest sustainable superconductive traffic. We found 
these mechanisms complementary and even commensurable 
(especially for dense networks). 

In reality, a network can be simultaneously subject both 
to internal irreparable faults and external excessive traffic. 
Figure [3] shows the network phase transition diagram for 
C = 4 and d = 0.2 for all possible values of < m < 1 
(including mo) and < p < 1 (including po). The bound- 
ary between the choking and choke-free areas was calculated 
using best fit parameter estimation for the Eq. [S] 

mo(r)«l-V. (5) 

We found that A ~ 1 is almost independent of either C 
or d. On the contrary, f3 is independent of d but diminishes 



from 1.35 to 1.15 as C increases from 2 to 9: the dependence 
of m on p is more linear for higher capacity networks. 

Incidentally, Eq. [5] is identical to Eq. O aside from the 
actual values of /3 (which are still the same in both equations 
for C > 9). At present, we do not know whether this is a 
coincidence or a rule. 

5 CONCLUSION 

Our research revealed a number of interesting characteris- 
tics of random transactional networks. We studied trans- 
action failures as a function of two factors, random node 
faults and incoming transaction volume. These revealed 
three phases of particular interest: "superconductive" (no 
transactions fail), "resistive" (some transactions fail), and 
"dielectric" (all transactions fail). We found that the injec- 
tion rates associated with the phase transitions, scale almost 
quadratically with respect to the node capacity, thus provid- 
ing network throughput amplification and allowing capacity 
planners to determine optimal reliability requirements that 
maximize the superconductive region. 

We also found that at the resistive-to-dielectric phase 
transition, the effects of excessive network load and inter- 
nal, spontaneous, and irreparable node faults are equivalent 
and almost perfectly anticorrelated. This knowledge can be 
used to compensate faults in isolated unmanaged networks 
by properly and predictably adjusting external traffic or to 
determine the amount of spare nodes needed to sustain pre- 
dictable bursts of traffic. 

Further study is required to quantify and qualify the ef- 
fects of network structure including density, modularity, and 
assortativity. 

As discussed in the introduction, the overriding goal of 
this research is to study the power of networks in general. 
Further study will generalize the findings to include a larger 
class of networks and applications whereby resilience will 
be substituted by capability. For example, such general- 
ization will open new research areas in economic develop- 
ment whereby economic productivity is a result of complex 
economic networks. This research could also potentially be 
applied to our understanding of systemic risk and effective 
governance, to name a few, through a greater appreciation 
of network dynamics. 
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